Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline
نویسندگان
چکیده
The goal of the Encyclopedia of Life (EOL) Project is to predict structural information for all proteins, in all organisms. This calculation presents challenges both in terms of the scale of the computational resources required (approximately 1.8 million CPU hours), as well as in data and workflow management. While tools are available that solve some subsets of these problems, it was necessary for us to build software to integrate and manage the overall Grid application execution. In this paper, we present this workflow system, detail its components, and report on the performance of our initial prototype implementation for runs over a large-scale Grid platform during the SC’03 conference.
منابع مشابه
Grid Portal Interface for Interactive Use and Monitoring of High-Throughput Proteome Annotation
High-throughput proteome annotation refers to the activity of extracting information from all proteins in a particular organism using bioinformatics software on a high performance computing platform such as the grid. The Encyclopedia of Life (EOL) project [1] aims to catalog all proteins in all species for public benefits using an Integrative Genome Annotation Pipeline [2] (iGAP). The intrinsic...
متن کاملBioinformatics for plant genome annotation
High throughput sequencing must be matched by high throughput annotation. Given the large number of annotation tools available, a multitude of interdependent analyses are required for an in-depth annotation of even a single BAC sequence. Special annotation pipeline software is required to make such annotation processes feasible in an automated fashion. In terms of functionality, such software s...
متن کاملSimple high-throughput annotation pipeline (SHAP)
SUMMARY SHAP (simple high-throughput annotation pipeline) is a lightweight and scalable sequence annotation pipeline capable of supporting research efforts that generate or utilize large volumes of DNA sequence data. The software provides Grid capable analysis, relational storage and Web-based full-text searching of annotation results. Implemented in Java, SHAP recognizes the limited resources ...
متن کاملAn integrated pipeline for protein classification using specific PSSMs and existing protein annotations
Protein classification has been performed by many protein databases to infer annotations of unknown proteins and therefore enhance the performance of protein annotation. In this study, we implemented an integrated pipeline for protein classification using specific PSSMs and proteins with the same entity name. After clustering sequences on the basis of their evolutionary distances, a target grou...
متن کاملMapping of Scientific Workflow within the e-Protein project to Distributed Resources
The e-Protein project, a BBSRC pilot project, aims to examine the issues in building a structure-based annotation of the proteins in the major genomes by linking resources (computing, software and databases) at three sites using Grid technologies. This paper describes the implementation of the Imperial College annotation pipeline (3D-GENOMICS) within ICENI. The scientific problem of large-scale...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004